Skip to content

Conversation

@vahid-ahmadi
Copy link
Collaborator

@vahid-ahmadi vahid-ahmadi commented May 23, 2025

Fixes #246

Implement ASEC Undocumented Algorithm (paper)

Algorithm Logic: Process of Elimination

  1. Start: Assume all foreign-born non-citizens might be undocumented (code 0)
  2. Remove: People with evidence of legal status → move to code 3 ("OTHER_NON_CITIZEN")
  3. Result: Those remaining in code 0 have no clear indicators of legal status → likely undocumented

Target Implementation

Modify the existing add_ssn_card_type() function to apply these conditions before the random refinement step, ensuring that people meeting any of these conditions are assigned to code 3 ("OTHER_NON_CITIZEN") rather than potentially remaining as code 0 ("NONE"/undocumented).

The 14 Conditions

Condition 1: Pre-1982 Arrivals

  • Logic: Remove those who arrived before 1982 (eligible for IRCA amnesty)
  • Variable: PEINUSYR codes 1-7 (Before 1950 through 1980-1981)

Condition 2: Eligible Naturalized Citizens

  • Logic: Remove naturalized citizens who meet time/age requirements
  • Variables: PRCITSHP == 4, A_AGE >= 18, PEINUSYR (for years in US), A_MARITL, A_SPOUSE

Condition 3: Medicare Recipients

  • Logic: Remove those with Medicare coverage
  • Variable: MCARE == 1

Condition 4: Federal Retirement Benefits

  • Logic: Remove those receiving federal government pensions
  • Variables: PEN_SC1 == 3 OR PEN_SC2 == 3 (Federal government pension)

Condition 5: Social Security Disability

  • Logic: Remove those receiving SS disability benefits
  • Variables: RESNSS1 == 2 OR RESNSS2 == 2 (disabled adult or child)

Condition 6: Indian Health Service Coverage

  • Logic: Remove those with IHS coverage
  • Variable: IHSFLG == 1

Condition 7: Medicaid Recipients (State-specific adjustments needed)

  • Logic: Remove Medicaid recipients (with state policy exceptions)
  • Variables: CAID == 1, GESTFIPS (for state-specific rules)

Condition 8: CHAMPVA Recipients

  • Logic: Remove those with CHAMPVA coverage
  • Variable: CHAMPVA == 1

Condition 9: Military Health Insurance

  • Logic: Remove those with TRICARE/military coverage
  • Variable: MIL == 1

Condition 10: Government Employees

  • Logic: Remove government workers and military personnel
  • Variables: PEIO1COW codes 1-3 (federal/state/local gov) OR A_MJOCC == 11 (military)

Condition 11: Social Security Recipients

  • Logic: Remove those receiving Social Security payments
  • Variable: SS_YN == 1

Condition 12: Housing Assistance (State-specific adjustments needed)

  • Logic: Remove housing assistance recipients (with mixed-status family exceptions)
  • Variables: Household-level HPUBLIC == 1 OR HLORENT == 1, GESTFIPS

Condition 13: Veterans/Military Personnel

  • Logic: Remove veterans and active military
  • Variables: PEAFEVER == 1 OR A_MJOCC == 11

Condition 14: SSI Recipients

  • Logic: Remove those receiving SSI for themselves (not on behalf of children)
  • Variables: SSI_YN == 1, RESNSSI1/RESNSSI2 (to verify recipient)

Additional Steps

Family Correlation Adjustment

@vahid-ahmadi vahid-ahmadi changed the title add conditions Enhance SSN undocumented type imputation May 23, 2025
@vahid-ahmadi vahid-ahmadi self-assigned this May 23, 2025
@MaxGhenis
Copy link
Contributor

How many undocumented do we get at the end of it (in the CPS)?

@vahid-ahmadi
Copy link
Collaborator Author

How many undocumented do we get at the end of it (in the CPS)?

Undocumented ssn pop in CPS:

  • Family correlation (80%) and random refinement (30%): 3.0 million
  • Family correlation (80%) and random refinement (10%): 4.1 million
  • Family correlation (80%) and random refinement (80%): 0.8 million
  • Family correlation (50%) and random refinement (50%): 2.1 million
  • Family correlation (100%) and random refinement (0%): 4.6 million

Also, here is the pop in each category before and after applying the 14 refinement conditions (original CPS data):
image

@MaxGhenis
Copy link
Contributor

Ok we need to adjust to get the 11 million Pew estimates (might be more today)

@vahid-ahmadi
Copy link
Collaborator Author

Ok we need to adjust to get the 11 million Pew estimates (might be more today)

I adjusted the code to get the 11 million Pew estimates and the 2 million JCT estimates for the reconciliation reform.

@vahid-ahmadi vahid-ahmadi requested review from MaxGhenis and removed request for MaxGhenis May 28, 2025 15:21
@MaxGhenis
Copy link
Contributor

Also depending on the target year let's target the total undocumented population per these projections:

Year Millions of people Source / basis
2022 11.0 Official DHS Office of Homeland Security Statistics estimate for 1 Jan 2022 (Office of Homeland Security Statistics)
2023 12.2 Center for Migration Studies ACS‐based residual estimate (published May 2025) ([CMS]2)
2024 13.0 Reuters synthesis of experts ahead of the 2025 change of administration (“~13-14 million”) – central value used here (Reuters)
2025 13.0 Same midpoint carried forward – CBP data show a 95 % drop in new border apprehensions after Feb 2025, implying near-zero net growth for the year ([U.S. Customs and Border Protection]4)
2026 13.1 Author’s projection (see explanation below)
2027 13.2 “”
2028 13.3 “”
2029 13.3 “”
2030 13.4 “”
2031 13.4 “”
2032 13.3 “”
2033 13.2 “”
2034 13.1 “”
2035 13.0 “”

@vahid-ahmadi
Copy link
Collaborator Author

Also depending on the target year let's target the total undocumented population per these projections:

Year Millions of people Source / basis
2022 11.0 Official DHS Office of Homeland Security Statistics estimate for 1 Jan 2022 (Office of Homeland Security Statistics)
2023 12.2 Center for Migration Studies ACS‐based residual estimate (published May 2025) ([CMS]2)
2024 13.0 Reuters synthesis of experts ahead of the 2025 change of administration (“~13-14 million”) – central value used here (Reuters)
2025 13.0 Same midpoint carried forward – CBP data show a 95 % drop in new border apprehensions after Feb 2025, implying near-zero net growth for the year ([U.S. Customs and Border Protection]4)
2026 13.1 Author’s projection (see explanation below)
2027 13.2 “”
2028 13.3 “”
2029 13.3 “”
2030 13.4 “”
2031 13.4 “”
2032 13.3 “”
2033 13.2 “”
2034 13.1 “”
2035 13.0 “”

I've replaced the fixed 11 million target with dynamic targets.

@vahid-ahmadi vahid-ahmadi requested a review from MaxGhenis June 2, 2025 13:52
@vahid-ahmadi
Copy link
Collaborator Author

Screenshot 2025-06-09 at 16 00 30

Copy link
Contributor

@nikhilwoodruff nikhilwoodruff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is getting quite complex, could you add a page to the documentation with descriptions of the methodology, and some statistics of our results? e.g. splits of status counts.

@vahid-ahmadi
Copy link
Collaborator Author

Since this is getting quite complex, could you add a page to the documentation with descriptions of the methodology, and some statistics of our results? e.g. splits of status counts.

I added documentation. Once the implementation is reviewed, I will complete the doc with results.

vahid-ahmadi and others added 4 commits June 18, 2025 14:46
Remove git conflict markers and keep both test functions.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Add explicit UTF-8 encoding when reading/writing documentation files
to prevent UnicodeDecodeError on Windows systems.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@vahid-ahmadi vahid-ahmadi requested review from MaxGhenis, PavelMakarchuk and nikhilwoodruff and removed request for MaxGhenis June 19, 2025 11:22
@MaxGhenis
Copy link
Contributor

Please make the family adjustment probabilistic. It should only move people into undocumented status enough to hit the overall target in the CPS, if we don't have enough without it. Millions of US citizen children live with an undocumented parent ("mixed-status families") and the current algorithm results in zero such cases AFAIUI.

Also for reference here's the output from CI:

Step 0 - Initial: Code 0 people: 319,547,413

Step 1 - Citizens: Moved 296,159,040 people to Code 1

ASEC Conditions - Current Code 0 people: 23,388,373
Condition 1 - Pre-1982 arrivals: 1,144,841 people qualify for Code 3
Condition 2 - Eligible naturalized citizens: 0 people qualify for Code 3
Condition 3 - Medicare recipients: 1,653,008 people qualify for Code 3
Condition 4 - Federal retirement benefits: 13,045 people qualify for Code 3
Condition 5 - Social Security disability: 143,057 people qualify for Code 3
Condition 6 - Indian Health Service coverage: 21,500 people qualify for Code 3
Condition 7 - Medicaid recipients: 4,398,182 people qualify for Code 3
Condition 8 - CHAMPVA recipients: 4,972 people qualify for Code 3
Condition 9 - Military health insurance: 114,583 people qualify for Code 3
Condition 10 - Government employees: 690,024 people qualify for Code 3
Condition 11 - Social Security recipients: 1,200,030 people qualify for Code 3
Condition 12 - Housing assistance: 872,241 people qualify for Code 3
Condition 13 - Veterans/Military personnel: 69,[129](https://github.com/PolicyEngine/policyengine-us-data/actions/runs/15756429621/job/44412829970#step:7:130) people qualify for Code 3
Condition 14 - SSI recipients: 187,168 people qualify for Code 3
After conditions - Code 0 people: 15,991,100
  - Undocumented workers before adjustment: 10,931,508 (target: 8,300,000)
  - Undocumented students before adjustment: 901,242 (target: 399,000)
Step 3 - EAD workers: Moved 2,660,438 people from Code 0 to Code 2
Step 4 - EAD students: Moved 514,767 people from Code 0 to Code 2
After EAD assignment - Code 0 people: 12,870,904
Step 5 - Family correlation: Changed 1,326,999 people from Code 3 to Code 0 in households with Code 0 members
After family correlation - Code 0 people: 14,197,903
Step 6 - Target refinement: Moved 1,164,053 people from Code 0 to Code 3
After target refinement - Code 0 people: 13,033,850

Final populations:
  Code 0 (NONE): 13,033,850
  Code 1 (CITIZEN): 296,159,040
  Code 2 (NON_CITIZEN_VALID_EAD): 3,120,197
  Code 3 (OTHER_NON_CITIZEN): 7,234,327
Total undocumented (Code 0): 13,033,850 (target: 13,000,000)
Population log saved to: /home/runner/work/policyengine-us-data/policyengine-us-data/policyengine_us_data/datasets/cps/../../../docs/asec_population_log.csv
Documentation updated with population numbers: /home/runner/work/policyengine-us-data/policyengine-us-data/policyengine_us_data/datasets/cps/../../../docs/asec_undocumented_algorithm.ipynb

@vahid-ahmadi
Copy link
Collaborator Author

Please make the family adjustment probabilistic. It should only move people into undocumented status enough to hit the overall target in the CPS, if we don't have enough without it. Millions of US citizen children live with an undocumented parent ("mixed-status families") and the current algorithm results in zero such cases AFAIUI.

Addressed. Now we have the family adjustment probabilistic.

@nikhilwoodruff
Copy link
Contributor

Documentation looks nice!

Copy link
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! And I misspoke earlier: we're not adjusting citizenship so won't get the wrong mixed-status household composition. But still good to do the other parts probabilistically since mixed-SSN-category households may exist.

cps: h5py.File,
person: pd.DataFrame,
spm_unit: pd.DataFrame,
undocumented_target: float = 11e6,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus a flat 13/11 multiplier for next two

Suggested change
undocumented_target: float = 11e6,
undocumented_target: float = 13e6,


if target_weighted_ead_workers > 0 and len(worker_ids) > 0:
# Sort workers by weight (heaviest first) to minimize assignments needed
worker_weights = person_weights[worker_ids]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

total_weighted_workers - undocumented_workers_target
)

if target_weighted_ead_workers > 0 and len(worker_ids) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a function to modularize the targeting here: for each of {workers, students, all others}, compare the total potential undocumented to the undocumented target, then assign enough to documented to align them

@MaxGhenis MaxGhenis merged commit d8afcf2 into main Jun 23, 2025
8 checks passed
@MaxGhenis MaxGhenis deleted the ssn-impute-new branch June 23, 2025 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refine SSN imputation

4 participants